Understanding Summarization using VIDIZMO Speech & Text Analyzer
Summaries provide a concise and focused snapshot of your content, enabling viewers to quickly grasp the key information without sifting through unnecessary details. VIDIZMO lets you generate summaries for your content using the VIDIZMO Speech & Text Analyzer app. It allows you to create summaries of documents with selectable text, and transcribed audio or video files on your Portal.
The application performs abstractive summarization, where it extracts the essential ideas from the input transcriptions and generates coherent and comprehensive summaries. You can also configure the VIDIZMO Speech & Text Analyzer app to perform automatic summarization on content you upload to your Portal.
Using the application, you can also generate and regenerate summaries for your content using on-demand processing. The summaries are displayed in a separate tab on the audio or video’s playback page. You can also edit the summary according to your preferences. The content of your summary can also be moderated by specifying forbidden words in the VIDIZMO Speech & Text Analyzer.
For more details regarding the configurations, see Configuring the VIDIZMO Speech & Text Analyzer for Summarization.
Concept
The VIDIZMO Speech & Text Analyzer can be enabled to generate summaries of documents with selectable text, and transcribed audio and videos on your Portal. This functionality works on transcriptions generated from other indexing applications such as Azure ARM or AWS Indexer.
Abstractive Summarization
The summarization feature uses the AI model to perform abstractive summarization, which is different from extractive summarization, where a model selects the sentences it finds important from the input text and produces them in a summary without changing the words, vocabulary, or structure. The result is that the model provides a more coherent and well-structured summary created from the input text.
In abstractive summarization, the model processes the input text to get a general idea of the subjects covered in it and represents those core ideas in the output summary. This also means that the summary generated by the AI model may have a different sentence structure, vocabulary, context, or speech than the base input text.
Supported Languages
The VIDIZMO Speech & Text Analyzer can generate summaries for content in languages other than English. You can generate languages in most prominently spoken languages like:
- German
- French
- Italian
- Portuguese
- Hindi
- Spanish
- Thai
- Arabic
Account Metrics
In VIDIZMO, you can access metrics related to the consumption of various resources. One of the resources is AI Processing, which measures the number of AI processing activities you have carried out across your VIDIZMO Account.
The summarization feature falls under the "AI Processing" category, which means that your AI processing consumption increases every time you perform this activity.
For more information regarding reports, visit Consumption Reports for Deployment Overview.
Summarization Process
The summarization process begins with the preparation of data. During the data preprocessing phase, the model prepares the data for summarization. It removes special characters or tokens and removes whitespace between the text (such as the spaces between paragraphs).
After the preprocessing stage, the data is cleaned. The model then performs tokenization, in which the input text is broken into smaller components called tokens, each of which represents a word or sub-word. The created tokens are then assigned a unique token ID, which the AI model utilizes to indicate or identify a specific word or sub-word.
The AI model used by the VIDIZMO Speech & Text Analyzer uses encoding and decoding to perform summarization. The tokens created from the input text are encoded in hidden states that provide additional information about the words or sub-words. The decoder uses the hidden states and the information provided by these hidden states to make accurate predictions for the words to be used when it generates a summary.
Forbidden Words Parameter
The Forbidden Words parameter is used for content moderation, ensuring that specific words are avoided in the generated output text. The primary use case is where adherence to strict language guidelines or policies is required.
Similar to the tokenization of the input text (i.e., transcription), the model also tokenizes the words in the forbidden words list and assigns them a unique token ID. These token IDs are then used by the model to determine which words will be avoided or not generated in the output text (i.e., the summary).
When the model predicts the next likely word during summary generation, it checks the token IDs of the predicted words against the token IDs of forbidden words. If there's a match, the model will ignore that predicted word and choose the next most probable word or phrase to represent that information, without distorting the general idea of the input text.
Key Consideration
For the input of this parameter, you need to provide word(s) separately as independent entries. Phrases or sentences are invalid input and will not work for content moderation.
To see how you can utilize the summarization feature on your Portal, visit How to Perform Summarization using VIDIZMO Speech & Text Analyzer.